Comparative transcriptome profiling of high and low oil yielding Santalum album L

East Indian Sandalwood (Santalum album L.) is highly valued for its heartwood and its oil. There have been no efforts to comparative study of high and low oil yielding genetically identical sandalwood trees grown in similar climatic condition. Thus we intend to study a genome wide transcriptome analysis to identify the corresponding genes involved in high oil biosynthesis in S. album. In this study, 15 years old S. album (SaSHc and SaSLc) genotypes were targeted for analysis to understand the contribution of genetic background on high oil biosynthesis in S. album. A total of 28,959187 and 25,598869 raw PE reads were generated by the Illumina sequencing. 2.12 million and 1.811 million coding sequences were obtained in respective accessions. Based on the GO terms, functional classification of the CDS 21262, & 18113 were assigned into 26 functional groups of three GO categories; (4,168; 3,641) for biological process (5,758;4,971) cellular component and (5,108;4,441) for molecular functions. Total 41,900 and 36,571 genes were functionally annotated and KEGG pathways of the DEGs resulted 213 metabolic pathways. In this, 14 pathways were involved in secondary metabolites biosynthesis pathway in S. album. Among 237 cytochrome families, nine groups of cytochromes were participated in high oil biosynthesis. 16,665 differentially expressed genes were commonly detected in both the accessions (SaHc and SaSLc). The results showed that 784 genes were upregulated and 339 genes were downregulated in SaHc whilst 635 upregulated 299 downregulated in SaSLc S. album. RNA-Seq results were further validated by quantitative RT-PCR. Maximum Blast hits were found to be against Vitis vinifera. From this study, we have identified additional number of cytochrome family in high oil yielding sandalwood accessions (SaHc). The accessibility of a RNA-Seq for high oil yielding sandalwood accessions will have broader associations for the conservation and selection of superior elite samples/populations for further genetic improvement program.


Introduction
East Indian Sandalwood (Santalum album L; Family; Santalaceae) is evergreen hemi-parasitic perennial tree. S. album trees are found in semi-arid regions from India to the South pacific and the northern coast of Australia besides the Hawaii islands [1]. The economic value of sandalwood depends on the quantity of heartwood and its essential oil extracted from the heartwood as well roots of the mature trees of santalum spps. [2][3][4][5]. It has been used for perfumery, cosmetics, pharmaceutical, religious and cultural purposes over centuries [6]. Indian government categorized S. album as one of 32 recognized medicinal plant [7]. The essential oil is very important trait, which is subjected to host species, soil type, climatic effects and elite germplasm [8][9][10][11][12]. However the limited oil yield of sandalwood restricts the demand of oil. The sandalwood oil formation is independent of heartwood growth and it was assumed that constant amount of oil being formed nevertheless of trees/heartwood growth, similar age of trees and with the smaller diameter heartwood consisting trees may tend to have greater percentage of oil. The quality of oil is largely defined by the percentage of different fragrant sesquiterpenes within the oil, especially α and β santalol [5]. Out of other santalum species, S. album is valued as a source of high content of oil as it has high level of α and β santalol and it shows low variability in oil composition across its natural range [13]. Due to international demand for sandalwood heartwood and its oil, over the recent times S. album has been considered as private investment to develop a sandalwood industry [14]. Excessive harvest, habitat destruction and lack of pest management system, global sandalwood resources are threated globally which indicated the large-scale shortage and escalation the market price of sandalwood products [4,15,16]. Realizing the sharp decline in the sandalwood population, the Karnataka and Tamil Nadu Forest department amended the sandalwood act in 2001 and 2002 and declared the private sandalwood growers himself an owner of the sandalwood as per the amended Act. Further, Govt. of Karnataka made an amendment on the sale of sandalwood through Forest department and Government, Departments to eliminate the clandestine trade and to encourage farmers to take cultivation of Sandalwood on commercial scale during the last few years [7]. Due to the amendment, many of the private organizations and farmers have started raising sandalwood cultivation on their private/farm lands. Since sandalwood plantation is long term high investment by the farmers and forest department, so it is essential to identify and supply superior quality planting material to optimize the high economic returns than their investment. The breeding improvement is little due to its long generation time and lack of information about high oil yielding accessions/populations. Considering the constant increasing the global demand for sandalwood oil and genetic improvement purposes, the identification of factors regulating these qualitative and quantitative variations in oil is a critical issue. It was hypothesized that accumulation of sandalwood oil is a complex and dynamic process, which influenced by multiple genetic and environmental factors [17]. Candidate oil biosynthesizing genes, multiomics, trait associated mapping have been performed to investigate the mechanism of oil biosynthesis and accumulation. With the advancement of high throughput sequencing technology, several transcriptome profiling of studies have been carried out in sandalwood [18][19][20][21][22]. Although earlier studies showed that sandalwood oil biosynthesis pathways, identification of key oil biosynthesis genes (Cytochrome P450, Sesquisabinene synthases, and Sesquiterpene synthases), there are very few references available on transcriptomic oil biosynthesis regulation and accumulation. As such there are no any studies pertaining on transcriptomic regulation of sandalwood clones grown in identical environmental conditions. In this study, we performed comparative transcriptomic profiling of two identical accessions that differ significantly in oil content to understand the dynamic regulation of high and low oil accumulation. Understanding the high and low oil variants of the trees, as even a slight percentage improvement in sandalwood oil content will lead to significant value [23,24]. Our results provide new insight for better understanding of how to achieve more sandalwood oil production by manipulation of core pathways and gene involved.

Sampling site
The selection of S. album samples for transcriptome analysis was grounded on three factors [1] known age and [2] grown in identical environmental condition [3] diseased free trees. Therefore we selected 15 year old S. album trees grown in Institute of Wood Science and Technology (13.

Sample collection
For oil estimation and RNA isolation, the wood samples were collected up to GBH at 1.37M by using conventional drilling increment borer (leaf materials were takes as a positive control in RNA extraction process). The four core samples (two replicates of each sample) were marked as transition zone, heartwood and sapwood and frozen into liquid nitrogen. The samples were immediately stored in dry ice box and shipped to the Eurofins laboratory. Before RNA extraction from the core samples, the oil quantity and quality was estimated by UV-spectrophotometer followed by GC-MS analysis. Based on the oil variability in terms of high and low oil-yielding (SaSHc and SaSLc) samples were selected for De novo transcriptome analysis (S1 Table).

RNA isolation, cDNA library preparation and sequencing
The total RNA was extracted from transition zones of the selected cores samples by using modified CTAB and LiCl method [25,26] and to validate the RNA quality and quantity, sandalwood leaves were used as a positive control. The quality of isolated RNA measured by UV spectrophotometer at 260/280and 260/230 nm wavelengths and 1% agarose gel electrophoresis followed by measuring RNA concentration using a 2100 Bioanalyzer (Agilent Technologies). The concentration of RNA was obtained in SaSHc 1460.90 ng/μl and in SaSLc 12.65 ng/μl. The mRNA from the total RNA was extracted by using the poly-T attached magnetic beads, followed by fragmentation process. The cDNA library of S. album was constructed using 2 μL of total purified mRNA from each sample by using Illumina TruSeq stranded mRNA preparation kit. 1st strand cDNA conversion was carried out by using Superscript II and Act-D mix to facilitate RNA dependent synthesis and then second strand was synthesized by using second strand mix. The dscDNA was purified by using AMPure XP beads followed by A-tailing adapter ligation. The libraries were analyzed through 4200 TapeStation system (Agilent Technologies) by using high sensitivity D1000 screen tape. The Pairing end Illumina libraries were loaded on NextSeq500 for cluster generation and sequencing. Total two RNA libraries were generated with the paired end sequencing. To obtain high quality concordant reads the sequenced raw data were processed by Trimmomatic v0.38 [27]. In-house script (in python and R) software was used to remove adapters, ambiguous reads and low quality sequences and the high quality paired-end reads were used for De novo Transcriptome assembly. RNA-Seq data were produced in FASTQ format and the whole sequence reads archive (SRA) database has been deposited in NCBI under Biosample accession: SAMN1569426 SRA accession number: PRJNA648820.
De novo transcriptome assembly, unigenes classification and functional annotation transcripts were then further clustered into unigenes covering >90% at the 5X reads by using CD-HIT-EST-4.5.4 software [29] for further downstream analysis. Coding sequences (open reading frames, ORFs) within the unigenes (default parameters, minimum of 100 amino acid sequence) were predicted by TransDecoder v5.0. The longest ORFs were then subjected to BLAST analysis against PSD, UniProt, SwissProt, TrEMBL, RefSeq, GenPept and PDB databases to obtain protein information resource (PIR) for the prediction of coding sequences by Blast2GO software program [30].

Functional annotation
The functional annotation of genes was performed by DIAMOND (BLASTX compatible aligner) program software [31]. The functional identification of coding sequences in biological pathways of the respective sample reads was assigned to reference pathways in KEGG (Eukaryotic database). The output of KEGG analysis included KEGG orthology, corresponding enzyme commission (EC) numbers and metabolic pathways of predicted CDS by using KEGG automated annotation server KAAS (http://www.genome.jp/kaas-bin/kaas_main) [32].

Differential gene expression analysis
The differential expressed genes (DEGs) were identified between the corresponding samples by implementing a negative binomial distribution model in DESeq package (v.1.22.1_http:// www.huber.embl.de/users/anders/DESq) [33]. The combination for differential analysis was calculated as SaSHc (high oil yielding) vs SaSLc (low oil yielding) S. album. To analyze the differentially expressed genes, two software's (heatmap, and Scatter plot) were used to predict upregulated and downregulated genes in S. album. A heat map was constructed by using the log-transformed and normalized value of genes based on Pearson uncentered distance and average linkage method. The most similar transcriptome profile calculated by a single linkage method, a heatmap were generated, correlating sample expression profiles into colours. The heatmap shows the level of gene expression and represented as log2 ratio of gene abundance between high and low oil yielding samples. An average linkage hierarchical cluster analysis was performed on top 50 differentially expressed genes using multiple experiments viewer (MeV v4.9.0) [34]. The colour represents the logarithmic intensity of the expressed genes. Relatively high expression values were showed in red (identical profiles) and low expression values were showed in green (the most different profiles). The scatter plot is used for representing the expression of genes in two distinct conditions of each sample combination i.e., high and low oil yielding clones. It helps to identify genes that are differentially expressed in one sample with respect to the corresponding samples. This allows the comparison of two values associated with genes. The vertical position of each gene in form the of dots represents its expression level in the high oil yielding samples while the horizontal position represents its expression level in the treated samples. Thus, genes that fall above the diagonal are over-expressed and gene that fall below the diagonal are under expressed as compared to their median expression level in experimental grouping of the experiment.

Quantitative RT-PCR analysis
Quantitative Real Time PCR was performed by using SYBR Green PCR master mix kit in a ste-pOnePlus Real Time PCR system (Applied Biosystem by Life Technologies, USA) to validate the gene expression profiles identified by RNA-Seq results. The cDNA was amplified in a 20 μL volume including 10 μL SYBR green PCR master mix, 2 μL cDNA, 2 μL of primers (1 μLfor each forward and reverse) and 6 μL of distilled water. RT-PCR master mix (TaKaRa). Six previously identified sandalwood oil biosynthesizing genes [13,35,36] (SaMTPS, SaFPPS, SaDSX, SaGGPS, SaGPS, and SaCYP450) specific primers were predicted using by the online tool Primer3 version 0.4.0 and synthesized at (Eurofins India Pvt. Ltd). The sequence of primers with a melting temperature between 60-61 0 C and a PCR product range of 151-229 bp were listed in (S2 Table). qRT-PCR was performed with stepOne Real time PCR system (Applied Biosystems, Thermofisher Scientific). The qRT-PCR reaction systems were as follows: 95 0 C for 20 s, followed by 40 cycles of 95 0 C for 5s, 60 0 C for 30s and 72 0 C 40 sec. Three replicates for each of the two biological replicates were performed. The transcript profiles were normalized using the reference housekeeping gene actin and the relative expression level of candidate genes were calculated with the 2 -ΔΔct standard quantitative method [37]. To compare the RNA-Seq and qPCR results, a linear correlation was calculated using the log2 of the normalized expression values. The fluorescence data were collected and analyzed with Step One analysis software.

Qualitative analysis of S. album oil
The selected core samples were quantitatively and qualitatively analyzed. The total oil percentage was found 4.96% and 0.93% for respective samples. Along with the oil content, α/β-santalol variation in SaSHc 59.30/32.21 and in SaSLc 49.52/26.60 was observed (S1 Table).

Library construction and transcriptome sequencing
A total of 38,785326 (SaSHc) and 35,94,4784 (SaSLc) raw PE reads were generated from the Illumina sequencing of S. album (Table 1). After removing adapters containing >5% unknown nucleotide sequences, ambiguous reads and low quality reads (reads with more than 10% quality threshold (QV) <20 phred score) 28,959187 and 25,598869 were obtained to respective samples. The total clean bases for SaSHc were 4.4 GB with 47.67% GC and 3.8 GB with 48.62% GC content for SaSLc. 141,781 clean pair-end reads were assembled into pooled non-redundant putative transcripts with the mean length of 1,149 bp followed by N50: 2,044. The obtained transcript length ranged from 201 to 15,872 (S3 Table). The transcripts were assembled into 31,918 unigenes with the mean length and N50 length 1,739 2,272 respectively (S3 Table). Of the unigenes we found 11.85% (3,785) 200-500 bp in length, 19.06% (6,085) were 500-1000 bp in length, 36.28% (11,582) were 1000-2000 bp in length, 19.35% (6179) 2000-3000 bp in length, 8.42% (2688) 3000-4000 bp in length, 2.96% (946) 4000-5000 bp in length and 2.04% (653) exceeded 5000 bp (S3 Table). A total number of coding sequences (CDS) in pooled samples were found 2.271 million with total 2.810 billion bp. (S3 Table). Sample wise number of CDS was in SaSHc and SaSLc was 2.12 million and 1.811 million followed by total CDS base length 2.657 billion in SaSHc and 2.307 billion (S3 Table). Gene functional annotation and classification Total 22,710 CDS were BLAST and 20,842 CDS were annotated by NCBI databases (S3 Table).  (Fig 1A-1C). GO annotations for molecular functions (SaSHc 13; SaSLc 12), biological process (SaSHc; 21, SaSLc; 22) and cellular component analysis SaSHc (16) and SaSLc (17) were plotted by WEGO plotting tool. These domains were further containing Cellular component and in Molecular functions followed by Biological process respectively. The number of differential expressed genes (DEGs) in biological regulation terms was observed 5,108 in SaSHc and 4,442 in SaSLc. Data showed that prominent GO terms in biological process were metabolic process, cellular process, biological regulation, localization, stimulus, cellular component organization or biogenesis and signaling. Similar result was observed in cellular components viz, SaSHc (4,168) and SaSLc (3,642). In cellular components, majority of GO terms was related to cell, cell part organelle, membrane enclosed lumen, membrane and protein containing complex related genes was overrepresented in SaSHc. In molecular function, the number of DEGs were involved in GO terms was 5,758 in SaSHc and 4,972 in SaSLc.
The DEGs were prominently participated in catalytic activity, binding, transport activity, molecule carrier activity, antioxidant activity, and signal transducer activity. Among cellular components, cytosol, intracellular part, cytoplasmic fraction and cytoplasm were overrepresented in SaSHc as compared to SaSLc accessions. High number of genes was found in SaSHc (41,900 genes) compared to SaSLc (36,571 genes) that was further classified into biological process, cellular component and molecular functions. Highest number of genes was functionally annotated and was observed in biological process (SaSHc 16,361) and (SaSLc 14,459) followed by molecular function (Fig 2A and 2B).

Kyoto encyclopedia of genes and genomes (KEGG) pathway mapping
Significant DEGs between SaSHc and SaSLc were mapped to reference canonical pathways in KEGG database. A total of 6,159 and 5,554 CDS of SaSHc and SaSLc were found to be categorized into 24 major KEGG pathways and were grouped in five main categories (Table 3). All assembled unigenes were subjected to further functional prediction and classification by KEGG Orthology (KO) database. Results showed 6,159 and 5,554 unigenes involvement in 24 groups in the KO database in respective samples and further subcategorized into 213 metabolic pathways (Table 3) (S1-S3 Figs). KEGG metabolite pathways represented 10 major pathways like metabolism, terpenoid synthesis, amino acid metabolism, purine metabolism, pyrimidine,

PLOS ONE
transcription, translation, amino acyl-tRNA biosynthesis, DNA replication and membrane transport in sandalwood ( Table 4). The EC numbers were classified in KEGG pathways, enabling the presentation of enzymatic functions in the context of the metabolic pathways. Among the identified pathways, secondary metabolite-flavonoid, and terpenoid related transcripts were over-represented (Table 4).

DEGs involved in sandalwood oil biosynthesis in S. album
DEGs were further annotated with KEGG database to deep insight the gene products for metabolism and functions related genes in different classified pathways. We performed an enrichment analysis of gene ontology (GO) terms with high significance in the upregulated DEGs. To identify metabolic pathways, SaSHc (297) and SaSLc (259) DEGs were mapped. As a result, 14 major pathways have been shown to play important role in sandalwood oil biosynthesis. Most pathways were resulted to secondary metabolites biosynthesis and metabolism by ) and leads to up-regulation metabolic pathways. All these Go terms can be connected with sandalwood oil biosynthesis through an enhanced production of gene products in S. album oil biosynthesis pathway ( Table 5).

Profiling of differential expressed genes (DEGs) participated in sandalwood oil biosynthesis regulation
All stages of sandalwood oil biosynthesis were examined, and a comparative analysis was done using aligned reads and the transcripts were grouped based on their degree of expression  (log2FC). 16,665 differentially expressed genes were commonly detected in both the accessions (SaHc and SaSLc). The results showed that 784 genes were upregulated and 339 genes were downregulated in high oil yielding accessions whilst 635 upregulated 299 downregulated in low oil yielding S. album accessions (Fig 3). Total biological process associated with DEGs (9 upregulated and 7 downregulated) in SaSHc and SaSLc sandalwood accessions were identified in which highly upregulated genes were identified in SaSHc energy metabolism followed by secondary metabolite biosynthesis (Table 6). Genes related to oil biosynthesis in S. album have been commonly expressed and previously listed in gene expression pattern represented by Scatter plot showed a significant log 2FC>16.0; P value <0.005 for upregulated genes and log 2FC<0.40; P value <0.005 downregulated in case of SaSHc sample (Table 7). Approximately 4.39% genes were found upregulated and 1.87% was downregulated in total differentially expressed genes. The normalized gene expression values from both the samples were used to estimate a Euclidian distance matrix based on transcript describing the similarities between the SaSHc and SaSLc samples. The red dots represented the upregulated genes and green dots represented the down regulated in DGE combination (Fig 4). Similar to scatter plot, based on their degree of expression (log2FC) values, Heatmap were also used to generate DEGs pattern. Heatmap showed transcript abundance level and indicated a similarity gradient between the SaSHc and SaSLc accessions. In Heatmap, gene expression was calculated in accordance with the method of FPKM, which takes into account the influence of both the sequencing depth and gene length on read count. In the FPKM distribution for selected samples, SaSHc showed the highest probability density distribution of gene expression, whereas, SaSLc displayed the lowest (Fig 5). The transcripts, which were highly expressed, were annotated for each gene as a high number of fold change and measure primarily the relative change of expression level. The top 50 highly upregulated genes (log2 FC 4.65-9.285) were shown in the Heatmap (Fig 5).

PLOS ONE
Transcriptome profiling of contrasting Sandalwood (S. album L.) accessions

PLOS ONE
The transcriptional mining identified ten unigenes participated in sandalwood oil biosynthesis with the upregulated relative gene expression log2FC viz,  (Table 8 and  S4 Table).

Phylogenetic analysis of identified cytochrome family in RNA-seq of S. album
Cytochrome P450 mono-oxygenases putatively involved in sandalwood oil biosynthesis (Diaz-Chavez et al. 2013 [18]). In order to phylogenetic analysis of cytochromes, BLAST was performed on pooled RNA-seq data and total 237 cytochrome genes (FC 6.87-0.234) were listed in which 84 cytochrome genes were observed with FC>1.0. Based on their structures, total nine groups of cytochrome gens were resulted i. Cytochrome b561 ii. Cytochrome P450 iii. Cytochrome c oxidase iv. Cytochrome P45076C2 v. Cytochrome c oxidase subunit1 vi. NADH-cytochromeb5 reductase vii. SaCYP736A167 viii. mitochondrial cytochrome b and ix. Cytochrome-P450 E-class (S6 Table).

Distribution of shared gene clusters across plant species
In the current study, majority of the blast hits were found to be against Vitis vinifera, Quercus suber, Juglans regia, Nelumbo nucifera, Thobroma cacao, Ziziphus jujuba, Hevea brasiliensis, Manihot esculenta and Jatropha curcus (Fig 6). BLAST

Validation of the expression profiles of candidate genes involves in high oil biosynthesis of sandalwood by real time PCR (q-PCR)
To validate the expression profiles of candidate genes obtained from the RNA-Seq analysis, six candidate genes relate with oil biosynthesis in the transition zone of sandalwood were selected for qRT-PCR analysis. The expression levels of the selected genes were compared with RNAseq results. The expression patterns of RNA-Seq and qRT-PCR revealed that the expression pattern of these genes were consistent which indicated the reliability of the RNA-seq data (Fig 7).

Discussion
Sandalwood oil have a wide variety of uses including perfumery, pharmaceutical and toiletries, which makes understanding the regulation of high essential oil biosynthesis in sandalwood tree highly important [38,39]. Due to unregulated harvesting many natural sources of elite sandalwood trees have been exhausted, in response to this sandalwood tree plantations have been established in many region of southern India. It has been reported that the oil content of sandalwood varies tree to tree with a negative correlation [8]. In this study we identified sandalwood trees of high and low oil content (SaSHc and SaSLc). Sandalwood oil biosynthesis and its regulation are extensively documented in sandalwood [18][19][20]. Our objective was to identify the transcriptomic responses considering high and low oil yield sandalwood grown in similar field condition. This study was focused on identifying genes involved in high oil biosynthesis and its regulation. In recent years, RNA-seq has been extensively employed for sandalwood oil biosynthesis pathway [18][19][20]35]. To understand the dynamic regulation of oil accumulation and concentration variation, a comparative De novo transcriptome profiling of two identical accessions that differ significantly in oil content was carried out. Using this, we tried to infer the effect of change in gene structure difference in sandalwood accessions and underlying the molecular mechanism is important for developing high oil yielding cultivation of sandalwood (SaSHc and SaSLc). Various approaches for functional annotation of the assembled transcripts have been used to identify the genes in which mostly were involved in secondary metabolite biosynthesis in sandalwood. Functional annotation of assembled 6159

PLOS ONE
and 5,554 CDS of high and low oil yielding accessions of sandalwood showed high number of CDS were involved in carbohydrate metabolism followed by genetic information (Table 3). Based on the functional annotation enrichment analysis of the differentially expressed genes, identified some overrepresented genes participated in high oil biosynthesis with the highest 96.46% similarity in cytochrome b560 and Cytochrome b561 containing protein At3g61750 with 67.43%. It is generally accepted that identification of orthologous gene clusters helps in taxonomic and phylogenetic classification. We identified, 11,013 orthologous gene clusters, suggested their conservation in the ancestry. GO analysis of both accession showed that the majority of genes are enriched in molecular function and biological process ( Table 2). Interestingly, a large fraction of genes involved in response to amino acid metabolism and transcription pathways (Table 4). A more number of unigenes were found to be involved in secondary metabolite biosynthesis in sandalwood was identified by mapping of unigenes against the KEGG pathway database (Table 4). Total number of unigenes including isoprenoid and

PLOS ONE
putative terpenoid pathway genes were involved in the secondary metabolite biosynthesis. By comparing the transcriptome profiles of high and low oil yielding heartwood, high and low oil yielding heartwood, 31,918 unigenes were identified of DEGs (S3 Table). Among them the expression level of E-E Fernasyl diphosphate synthase/ Fernasyl diphosphate synthase in the terpenoid backbone biosynthesis was upregulated in the comparison (Table 8) indicating increased supply of precursors for diterpene biosynthesis. Other genes of interest emerged from this study, which include several members of the transcription factor family. 41 and 47 were all upregulated in the heartwood with high and low oil content accessions sandalwood (Table 9). In another study of sandalwood, 58 families of transcription factors were observed [21]. However, we were unable to detect some of the transcription factors in our data. The quantitative variations in the essential oil content in high and low oil yielding sandalwood accessions can be due to terpene synthase and other secondary metabolite related gene expression and regulation. The result of annotation indicated that more than 51% of assembled unigenes of sandalwood matched with the genomic database of other plants (Fig 6). The unigenes identified in this research, had a higher annotation percentage against the Vitis vinifera (25%) genome database compared to other plant database (Fig 6). KEGG analysis showed that the terpenoid synthesis and metabolism pathway was significantly enriched (Table 4) in sandalwood. A total 216 transcripts were involved in terpenoid biosynthesis. However, 181 transcripts were involved in low oil yielding sandalwood accession. Other transcriptome studies in sandalwood have detected low number of genes involved in oil biosynthesis [18][19][20][21]35]. However no significant expression of genes directly involved in high oil biosynthesis was reported in Sandalwood. We identified selected candidate genes, which were specifically expressed in SaSHc (Table 8) along with previously identified genes [18,19,21,36]. The lower number presented in our data set is likely because core tissue of sandalwood were used for transcriptome analysis. The oil biosynthesis genes were abundantly expressed in SaSHc when compared to SaSLc accessions and validated the participation of genes in high oil biosynthesis Table 5. We observed SaCYP736A167 in our predicted gene sets, which identified as a candidate key oil biosynthesis gene in S. album in previous reports [18]. In this study identified a cohort of genes in the terpenoid backbone biosynthesis and monoterpenoid biosynthesis pathway that were commonly upregulated in heartwood of high oil content sandalwood compared to low oil content sandalwood. The knowledge obtained from this study could facilitate manipulation of sandalwood essential oil production through metabolic engineering of essential oil biosynthesis.

Conclusion
The comparative analysis of the sandalwood oil accumulating core tissues of sandalwood showed that transcriptional regulation plays a key role in the considerable differences in oil content between high and low oil yielding sandalwood. To the best of our knowledge, this is the first study reporting the comparative transcriptomic response of sandalwood using RNA--Seq approach and identified different group of genes in high oil yielding samples under the similar condition. The present study generated a well-annotated pair end read RNA libraries and the results unveiled genome wide expression profile of sandalwood oil biosynthesis. Analysis of transcriptome data sets, identified transcripts that encode various transcription factor, metabolism of terpenoids, environment response element and biosynthesis of other secondary metabolites. Nevertheless, we also discovered some of the oil biosynthesis candidate genes SaCYP736A167, DXR, DSX and FPPS genes that participates in sandalwood oil biosynthesis and accumulation of oil in heartwood. The results suggested an intricate signalling and regulation cascade governing sandalwood oil biosynthesis involving multiple metabolic pathways.